EVENTO
OPTIMAL EXECUTION OF SCIENTIFIC WORKFLOWS IN IN-MEMORY DATAFLOW FRAMEWORKS
Tipo de evento: Exame de Qualificação
The volume of data produced by scientific simulations and experiments has been increasing in an astronomical rate. Normally, scientific applications consuming high volume of data are defined as workflows. There have been a huge progress in parallel execution of scientific workflows in shared-nothing clusters. However, most of the current Scientific Workflows Management Systems do not handle the memory and data locality appropriately. Apache Spark deals with these issues by chaining activities that should be done locally, among other optimizations such as the in-memory storage of intermediate data and caching of pre-computed values. Spark requires existing workflows to be described using its own API, which forces the activities to be implemented in Python, Java, Scala or R, to take advantage of the RDD, a memory-based storage that allows Spark to execute a chain of activities efficiently. In this qualification proposal, we describe a project to develop a Scientific Workflow Management System called TARDIS, whose objective is to run existing workflows (e.g. designed for Pegasus or Chiron) inside a Spark cluster, using RDDs and smart caching, in a completely transparent way for the user.
Data Início: 22/11/2016 Hora: 13:30 Data Fim: 22/11/2016 Hora: 15:30
Local: LNCC - Laboratório Nacional de Computação Ciêntifica - Auditorio B
Aluno: Daniel Gaspar Gonçalves de Souza - Universidade Católica de Petrópolis - UCP
Orientador: Fabio André Machado Porto - Laboratório Nacional de Computação Científica - LNCC
Participante Banca Examinadora: Antônio Tadeu Azevedo Gomes - Laboratório Nacional de Computação Científica - LNCC Artur Ziviani - Laboratório Nacional de Computação Científica - LNCC Daniel de Oliveira - -